GPU-Based Parallel Integration of Large Numbers of Independent ODE Systems

نویسندگان

  • Kyle E. Niemeyer
  • Chih-Jen Sung
چکیده

The task of integrating a large number of independent ODE systems arises in various scientific and engineering areas. For nonstiff systems, common explicit integration algorithms can be used on GPUs, where individual GPU threads concurrently integrate independent ODEs with different initial conditions or parameters. One example is the fifth-order adaptive Runge–Kutta– Cash–Karp (RKCK) algorithm. In the case of stiff ODEs, standard explicit algorithms require impractically small time-step sizes for stability reasons, and implicit algorithms are therefore commonly used instead to allow larger time steps and reduce the computational expense. However, typical high-order implicit algorithms based on backwards differentiation formulae (e.g., VODE, LSODE) involve complex logical flow that causes severe thread divergence when implemented on GPUs, limiting the performance. Therefore, alternate algorithms are needed. A GPUbased Runge–Kutta–Chebyshev (RKC) algorithm can handle moderate levels of stiffness and performs significantly faster than not only an equivalent CPU version but also a CPU-based implicit algorithm (VODE) based on results shown in the literature. In this chapter, we present the mathematical background, implementation details, and source code for the RKCK and RKC algorithms for use integrating large numbers of independent systems of ODEs on GPUs. In addition, brief performance comparisons are shown for each algorithm, demonstrating the potential benefit of moving to GPU-based ODE integrators. ∗Final version published in Numerical Computations with GPUs (2014) Springer International Publishing, Ch. 8, pp. 159–182. V. Kindratenko (Ed.). doi:10.1007/978-3-319-06548-9_8 †Corresponding author; Email: [email protected] ar X iv :1 61 1. 02 27 4v 1 [ cs .M S] 6 N ov 2 01 6

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ageing Orders of Series-Parallel and Parallel-Series Systems with Independent Subsystems Consisting of Dependent Components

In this paper, we consider series-parallel and parallel-series systems with independent subsystems consisting of dependent homogeneous components whose joint lifetimes are modeled by an Archimedean copula. Then, by considering two such systems with different numbers of components within each subsystem, we establish hazard rate and reversed hazard rate orderings between the two system lifetimes,...

متن کامل

Implementation of the direction of arrival estimation algorithms by means of GPU-parallel processing in the Kuda environment (Research Article)

Direction-of-arrival (DOA) estimation of audio signals is critical in different areas, including electronic war, sonar, etc. The beamforming methods like Minimum Variance Distortionless Response (MVDR), Delay-and-Sum (DAS), and subspace-based Multiple Signal Classification (MUSIC) are the most known DOA estimation techniques. The mentioned methods have high computational complexity. Hence using...

متن کامل

Fast Cellular Automata Implementation on Graphic Processor Unit (GPU) for Salt and Pepper Noise Removal

Noise removal operation is commonly applied as pre-processing step before subsequent image processing tasks due to the occurrence of noise during acquisition or transmission process. A common problem in imaging systems by using CMOS or CCD sensors is appearance of  the salt and pepper noise. This paper presents Cellular Automata (CA) framework for noise removal of distorted image by the salt an...

متن کامل

Flexible Runtime Support for Efficient Skeleton Programming on Heterogeneous GPU-based Systems

SkePU is a skeleton programming framework for multicore CPU and multi-GPU systems. StarPU is a runtime system that provides dynamic scheduling and memory management support for heterogeneous, accelerator-based systems. We have implemented support for StarPU as a possible backend for SkePU while keeping the generic SkePU interface intact. The mapping of a SkePU skeleton call to one or more StarP...

متن کامل

Parallelization of Rich Models for Steganalysis of Digital Images using a CUDA-based Approach

There are several different methods to make an efficient strategy for steganalysis of digital images. A very powerful method in this area is rich model consisting of a large number of diverse sub-models in both spatial and transform domain that should be utilized. However, the extraction of a various types of features from an image is so time consuming in some steps, especially for training pha...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1611.02274  شماره 

صفحات  -

تاریخ انتشار 2016